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sparse Sums of Positive Semidefinite Matrices 
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Abstract 

*vj ' Recently there has been much interest in "sparsifying" sums of rank one matrices: modifying the 

coefficients such that only a few are nonzero, while approximately preserving the matrix that results 

O ■ from the sum. Results of this sort have found applications in many different areas, including sparsifying 

^^ I graphs. In this paper we consider the more general problem of sparsifying sums of positive semidefinite 

matrices that have arbitrary rank. 

We give several algorithms for solving this problem. The first algorithm is based on the method of 
Batson, Spielman and Srivastava (2009). The second algorithm is based on the matrix multiplicative 
weights update method of Arora and Kale (2007). We also highlight an interesting connection between 
these two algorithms. 

Our algorithms have numerous applications. We show how they can be used to construct graph 
sparsifiers with auxiliary constraints, sparsifiers of hypergraphs, and sparse solutions to semidefinite 
O ! programs. 
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1 Introduction 

A sparsifier of a graph is a subgraph that approximately preserves some structural properties of the graph. 
The original work in this area studied cut sparsifiers, which are weighted subgraphs that approximate every 
cut arbitrarily well. The celebrated work of Bencziir and Karger ||5] |6l proved that every undirected graph 
with n vertices and m edges (and potentially non-negative weights on its edges) has a subgraph with only 
0(n log n/e^) edges (and new weights on those edges) such that, for every cut, the weight of the cut in 
the original graph and its subgraph agree up to a multiplicative factor of (1 it e). Bencziir and Karger also 
gave a randomized algorithm to construct a cut sparsifier in 0{m/e^) time. Recent work has extended and 
improved their algorithm in various ways |[T0l[m[T2l[T4l[T5]| . 

Spielman and Teng [39] introduced spectral sparsifiers, which are weighted subgraphs such that the 
quadratic forms defined by the Laplacians of the graph and the sparsifier agree up to a multiplicative factor 
of (1 lb e). Spectral sparsifiers are also cut sparsifiers, as can be seen by evaluating these quadratic forms at 
{0, l}-vectors. They proved that every undirected graph with n vertices and m edges (and potentially non- 
negative weights on its edges) has a spectral sparsifier with only npolylog(n)/e^ edges (and new weights 
on those edges). Spielman and Srivastava |[38]| reduce the graph sparsification problem to the following 
abstract problem in matrix theory. 

Problem 1. Let wi, . . . , w^ G M" Z>e vectors and let B = ^^ Vivf . Given e G (0, 1), find a vector y G M™ 
with small support such that y > and 

B < J]2/ii;i^f < {l + e)B. (1) 

i 

(Here the notation X <Y means that the matrix Y — X is positive semidefinite.) 

Spielman and Srivastava |[38l observe that Problem [U can be solved using known concentration bounds 
on operator- valued random variables, specifically Rudelson's sampling lemma |[32l [33l . This approach 
yields a vector y with support size 0{n log n/e^), and therefore yields a construction of spectral sparsifiers 
with 0(n log n/e^) edges. Their algorithm relies on the linear system solver of Spielman and Teng [39], 
which was significantly simplified by Koutis, Miller and Peng ll24l . Recent work [23] has improved the 
space usage of Spielman and Srivastava's algorithm. 

In subsequent work, Batson, Spielman and Srivastava [4] give a deterministic algorithm that solves 
Problem [Hand produces a vector y with support size 0(n/e^). Consequently they obtain improved spectral 
sparsifiers with Oinje^) edges. This work led to important progress in metric embeddings Il29ll34l . convex 
geometry |^] and Banach space theory [37]. 

In this paper, we focus on a more general problem. 

Problem 2. Let Bi, . . . , B^ be symmetric, positive semidefinite matrices of size n x n and let B = ^^ Bi. 
Given e £ (0, 1), find a vector y G M*" with small support such that y >0 and 

B < Y^y^^i ^ il + £)B. (2) 

i 

This problem can also be solved by known concentration bounds: Ahlswede and Winter [1] give a 
method for generalizing Chernoff-like bounds to operator-valued random variables, and one of their theo- 
rems [X_. Theorem 19] directly yields a solution to Problem [2l (Other expositions of these results also exist 
ll4T[[T6l .) This approach yields a vector y with support size 0{n log n/e^). See Section|3]for more details. 

This paper gives two improved solutions to Problem |2] Our interest in this topic is motivated by sev- 
eral applications, such as constructing sparsifiers with certain auxiliary properties and sparsifiers for hyper- 
graphs. We discuss these applications in Section [T!2l 
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1.1 Our Results 

We give several efficient algoritlims for solving Problem |2l Our strongest solution is: 

Theorem 3. Let Bi, . . . , Bm be symmetric, positive semidefinite matrices of size n x n and arbitrary rank. 
Set B := Y2ii ^i- Po^ '^^y ^ S (0) 1)' there is a deterministic algorithm to construct a vector y € M™ with 
0{n/e^) nonzero entries such that y > and 



B ^ Y^yiBi ^ il+e)B. 



The algorithm runs in 0{miT' /e^) time. Moreover, the result continues to hold if the input matrices 
Bi, . . . , Bm are Hermitian and positive semidefinite. 

Our proof of Theorem |3] is quite simple and builds on results of Batson, Spielman and Srivastava lH. 
We remark that the assumption that the i?j's are positive semidefinite cannot be removed; see Appendix iDl 

We also give a second solution to Problem |2] which is quantitatively weaker, although it is based on 
very general machinery which might prove useful in further applications or generalizations of Problem |2] 
This second solution is based on the matrix multiplicative weights update method (MMWUM) of Arora and 
Kale ||3l|22l. By a black-box application of their theorems we obtain a deterministic algorithm to construct a 
vector y with 0{n log n/e^) nonzero entries. By slightly refining their analysis we can improve the number 
of nonzero entries to 0{n\ogn/e^). We remark that Orecchia and Vishnoi [30] have used MMWUM for 
solving the balanced separator problem; this can be used as a subroutine in Spielman and Teng's algorithm 
for constructing spectral sparsifiers. 

Another virtue of our second solution is that it illustrates that the surprising Batson-Spielman-Srivastava 
(BSS) algorithm is actually closely related to MMWUM. In particular, the algorithms underlying our two 
solutions are identical, except for the use of slightly different potential functions. We explain this connection 
in Section [H 

1.2 Applications 

In this section, we present several applications of Problem [2] Proofs are given in Appendix lAl 

Sparsifiers with costs. 

Corollary 4. Let G = {V, E) be a graph, letw: E ^ M+ be a weight function, and let ci, . . . ,Ck. -E — )• ]R_|_ 
be cost functions, with k = 0{n). Let Cg{w) denote the Laplacian matrix for graph G with weight function 
w. For any real e £ (0, 1), there is a deterministic polynomial-time algorithm to find a subgraph H of G 
and a weight function wh ■ E[H) — t- M-|- such that 

Cg{w) < Ch{wh) < (l + e)£GH, 

Y^WeCi^e < ^ WH,eCi,e < (1 + e) ^ tUeQ.e for alH 

e€E e£E{H) eGE 

and\E{H)\ =0{n/e^). 

The inequalities Cg{w) < Ch{wh) ^ (1 + s)^g{w) are equivalent to the condition that the subgraph 
H (with weights wh) is a spectral sparsifier of G (with weights w). We remark that existing methods for 
producing sparsifiers have low probability of approximately satisfying even a single cost function (i.e., the 
case k = 1). 

One potentially interesting application of sparsifiers with costs is as follows. 



Corollary 5 (Rainbow Sparsifiers). Let G = {V, E) be a graph and let w: E ^ M_|_ be a weight function. 
Let El, . . . ,Ek be a partition of the edges, i.e., each edge is colored with one of k colors. For any real 
£ £ (0,1), there is a deterministic polynomial-time algorithm to find a subgraph H of G and a weight 
function wh ■ E(H) — )• M_|_ such that 

Cg{w) ^ Ch{wh) ^ (l + e)£GH, 
{1 — e) y^ We < 2> WH,e < (1 + s) /, ""^e for alH, 

eS-Bj e£E(H)nE, eeEi 

and\E{H)\ = 0{{n + k)/e^). 

Hypergraph sparsifiers. Let T-L = {V, £) be a hypergraph, and let tt; : <S — )• M+. We follow the definition 
of Laplacian for hypergraphs as in IIBTTI . For each hyperedge E £ £, define its Laplacian Ce as the graph 
Laplacian of a graph on V whose edge set forms a clique on E. Define the Laplacian for the hypergraph V. 
with weight function w as the matrix C-}i{w) := J^egs we^e- 

Corollary 6 (Spectral sparsifiers for hypergraphs). For any real e G (0,1), there is a deterministic polynomial- 
time algorithm to find a sub-hypergraph Q ofH and a weight function wg : S{G) — )• M+ such that 

Cniw) ^ Cgiwg) ^ il + e)Cniw), 

and\£{g)\ =0(n/e2). 

This corollary concerns spectral sparsifiers. It is also interesting to study sparsifiers that approximately 
preserve all cuts. There are several ways to extend the definition of "the weight of a cut" from ordinary 
graphs to hypergraphs. We consider the following two definitions, where S is any set of vertices in a 
hypergraph Ti with edge weights w. 

• w{6'u{S)): This is the sum of the weights of all hyperedges that contain at least one vertex in S and 
at least one vertex in S := V \ S. 

• w*{5n{S)): This is defined to be Xlijef "^i; ■ \S n E\ ■ \S n E\. 
Obviously these definitions agree in ordinary graphs. 

Corollary 7 (Cut sparsifiers for hypergraphs, second definition). For any real e G (0, 1), there is a deter- 
ministic polynomial-time algorithm to find a sub-hypergraph Q ofH. and a weight function wg : £{G) — >■ M-|_ 
such that 

w*{6n{S)) < w*g{6g{S)) < {l + e)w*{5n{S)) foreverySQV, 

and\£{g)\ =0(n/e2). 

Corollary 8 (Cut sparsifiers for hypergraphs, first definition). Assume that T-L is an r-uniform hypergraph. 
For any real e G (0, 1), there is a deterministic polynomial-time algorithm to find a sub-hypergraph Q ofH. 
and a weight function wg : £{G) — )• M^. such that 

^^u;(5w(S)) < wg{6giS)) < ^l±l}^w{6n{S)) \/SCV, 

and \£{G)\ = 0{n/e^). In other words, the sparsified hypergraph Q approximates the weight of the cuts in 
the hypergraph Ti to within a factor 0(r^). 



For the special case r = 3, we can achieve (1 + e)-approximate sparsification for all cuts, even under 
the first definition. 

Corollary 9 (Cut sparsifiers for 3-uniform hypergraphs). Assume that T-L is a 3-uniform hypergraph. For 
any e G (0, 1), there is a deterministic polynomial-time algorithm to find a sub-hypergraph QofH and a 
weight function wg : £{G) -^ K_|_ such that 

w{6n{S)) < wg{6giS)) < {i + e)w{6n{S)) V5 c y, 

and\£{g)\ =0(n/e2). 

Sparse solutions to semidefinite programs. 

Corollary 10. Let Ai, . . . , Am be symmetric, positive semidefinite matrices of size n x n, and let B be a 
symmetric matrix of size n x n. Let c £ M™ with c > 0. Suppose that the semidefinite program (SDP) 



mm <, c z : y 



ZiAi hB, ze M™, z>0\ 



has a feasible solution z*. Then, for any real e G (0, 1), it has a feasible solution z with at most 0{n/e^) 
nonzero entries and c^z < (1 + £)c^ z*. 

Several important SDPs can be cast as in Corollary [TOl see, e.g., |[T9ll20l . Recently, Jain and Yao 1211 
gave a parallel approximation algorithm for SDPs in this form with B positive semidefinite. 

Lovasz theta number. For a graph G = {V, E) on n nodes, let t'{G) denote the square of the minimum 
radius of an Euclidean ball in M" such that there is a map from V to points in the ball such that adjacent 
vertices are mapped to points at distance at least 1. Also, let 'd'{G) denote the variant of the Lovasz theta 
number introduced in ll27l and ll35l . 

Corollary 11. Let G = {V^ E) be a graph. For any real e G (0, 1), there is a deterministic polynomial-time 
algorithm to find a subgraph HofG such that 

{l-e)t'{G)<t'{H)<t'{G) 

and\E{H)\ =0{n/e'^). 

Corollary 12. Let G = {V, E) be a graph. For any real e £ (0, 1), there is a deterministic polynomial-time 
algorithm to find a supergraph HofG such that 

and\E{H)\ = {^^-0{n/e^). 

Corollary 13. Let G be a graph such that 'd'{G) = o(y^). For any real 7 > 0, there is a supergraph H of 
G such that 

1 + 7 

and \E{H)\ = (^) - 0{n'd{Gf h"^). 

Corollary 14. Let G be a graph such that ^9'(G) = VL{y/n). For any real 7 > 1, there is a supergraph H of 
G such that 

and\E{H)\ = Q - Oin" h^). 



Approximate Caratheodory theorems. One immediate application for Theorem [3] is an approximate 
Caratheodory-type theorem. A classic result of this sort is: 

Theorem 15 (Althofer 111, Upton- Young ||25l). Let vi,. . . ,Vm ^ [0, 1]" and let A G M'" satisfy A > and 
^j Aj = 1. Then there exists n G R'" with fi > 0, ^^ jii = 1 and only 0{\ogn/e^) nonzero entries such 

that lEjAjUj -Y.il^i^i\\oo ^ ^• 

This theorem follows from simple random sampling arguments, but it has several interesting conse- 
quences, including the existence of sparse, low-regret solutions to zero-sum games. The following corollary 
of Theorem [3] can be viewed as a matrix generalization of Theorem [T5l 

Corollary 16. Let Bi, . . . , B^ be symmetric, positive semidefinite matrices of size n x n and let A G M™ 

satisfy A > and ^^ Aj = 1. Let B = ^^ AjBj. For any e G (0, 1), there exists jj, > with ^^ ^j = 1 
such that fi has 0{n/e'^) nonzero entries and 



{l-e)B ^ Y.^l^Bi ^ {l+e)B. 



Although the support size in Theorem [15] is much smaller than in Corollary [161 the latter provides a 
multiplicative error bound whereas the former only provides an additive enor bound. Theorem [15] can be 
modified to give multiplicative error bounds if we allow fi to have 0{n log n/e^) non-zero entries. However 
such a result is not interesting as Caratheodory's theorem provides a // with only n + 1 non-zero entries and 
no error (i.e., e = 0). In contrast, Caratheodory's theorem is very weak in the scenario of Corollary [16] as it 
only provides a /i with n{n + l)/2 + 1 nonzero entries. 

Sparsifiers on subgraphs. 

Corollary 17. Let G = {V, E) be a graph, let w: i? — t- M-|- be a weight function, and let T be a collection 
of subgraphs of G such that ^p(zjr\y{F)\ = 0{n). For any real e G (0,1), there is a deterministic 
polynomial-time algorithm to find a subgraph H of G and a weight function wh '■ E{H) — )• M+ such that 
\E{H)\ =0{n/e'^)and 

Cg{w) ^ Ch{wh) ^ (l + e)£GH, 
Cf{wf) < CHnF{wH\E{Hr\F)) ^ {l + e)CF{wF) forallFGT, 

where wf '■= w\e(F) '■* ^^^ restriction ofw to the coordinates E{F) and HCiF = (V{F),E{F) n E(H)y 

2 Preliminaries 

For a non-negative integer n, we denote [n] := {1, . . . , n}. The non-negative reals are denoted by IR+. The 
set of n X n symmetric matrices is denoted by S". The set of symmetric, n x n positive semidefinite (resp., 
positive definite) matrices is denoted by S" (resp., §++). Recall that X G S" is positive semidefinite if 
v^ Xv > for all v G M", and X is positive definite if X is positive semidefinite and v^ Xv = implies 
f = 0. Sometimes we denote X G S" by X ^ and the notation X '^ Y means that X — y ^ 0. For 
X G S" and a, 6 G M, the notation X G [a, 6] means that al < X < bl, where / is the identity matrix. 

For X G S", its trace is TrX := X]"=iXjj, its largest (resp., smallest) eigenvalue is denoted by 
^uiax{X) (resp., Amin(-^))- The vector space S" can be endowed with the trace inner product (•, •) defined by 
{X, Y) := Ti{XY) = J2i,j XijYij for every X,Y e S". We shall repeatedly use that Ti{XY) = Tr{YX) 
for any matrices X, Y for which the products XY and YX make sense. 



Let G = {V, E) be a graph. The canonical basis vectors of MY are { e^ : z G V}, and the canonical basis 
vectors of M^ are { e^j j} : {i, j} € E}. The Laplacian of G is the hnear transformation Cc{-) '■ IK^ — ^ S^ 
defined by Cg{w) = E{ij}e£;^{i,j}(ei - ej)(ej - e^)^. 

When dealing with Problem|2l we may assume that B = I. See ||4l Proof of Theorem 1.1] for the details 
of the reduction. 

3 Solving Problem |2] by Ahlswede- Winter 

As mentioned earlier, Spielman and Srivastava ll38l explain how Problem [T] can be solved by Rudelson's 
sampling lemma. This lemma can be easily generalized to handle matrices of arbitrary rank using the 
Ahlswede-Winter inequality, yielding a solution to Problem |2] 

Let X be a random matrix such that X = Bij Tr Bi with probability pj := Tr Bij Tr /. Since i?i ^ 
and ^,. Bi = /, the pj's define a probability distribution. 

Theorem 18 ([ 1 , Theorem 19]). Let X, X\ , . . . , X^ he i.i.d. random variables with values in S" such that 

Xi £ [0, 1] for every i and E(X) = /i/ with fi G [0, 1]. Let e G (0, 1/2). Then 

In our case, E{X) = (1/n)/ and X G [0,1]. So ^ = 1/n. Thus, if T > (2 In 2) • '"";^^^°^ = 
0{n log n/e^), then P ( -^ ^^^i Xi ^ [1 — e,l + e]) < 1/2. Thus, with constant probability, we obtain a 
solution y to Problem |2] where y has only 0{n log n/e^) non-zero entries. 

4 Solving Problem |2] by BSS 

In our modification of the BSS algorithm [4], we keep a matrix A of the form A = J2i Ui^i with y > 0, 
starting with A = 0, and at each iteration we add another term aBj to A. We enforce the invariant that the 
eigenvalues of A lie in [i, u], where u and £ are parameters given by -u = uq + t6u and i = Iq + tdi after 
t iterations. This procedure is presented in Algorithm [T] The step of the algorithm which finds Bj and a 
can be done by exhaustive search on j and binary search on a. Instead of the binary search, one could also 
compare the quantities [/^(f_i)(i?j) and Lyi((_i)(i?j) defined below. 

In the original BSS algorithm, the matrices are rank one: Bj = VjvJ for some vector Vj. Their Lem- 
mas 3.3 and 3.4 give sufficient conditions on the new term avjvj so that the invariant on the eigenvalues is 
maintained; Lemma 3.5 gives sufficient conditions on the remaining parameters so that a suitable new term 
avjvJ exists with a > 0. In this section we generalize those lemmas to allow Bi matrices of arbitrary rank. 

Let ^ E S'^. If M G M with Amax(^) < u, define $"(A) := Tr(Ml - A)'^. IfieM. with XminiA) > £, 
define ^^(A) := Tt{A - H)-'^. Note that $^(^) = Y.i V(Ai - ^) and $"(^) = Y.i ^/(u - Aj), where 
Ai , . . . , A„ are the eigenvalues of A. 

Lemma 19 (Analog of Lemma 3.3 in Pl). Let A^W andX ^ S^ with X y^ 0. Let u £ R and 6u > 0. 
Suppose Amax(^) < u. Let u' := u + 6u and M := u'l — A. If 

1 (M-2,X) , 1 . 

- > -^r-h\ ■^TTTTT + {M' ,X) =: Ua(X), 

then Amax(^ + aX) < u' and $"'(A + aX) < $"(^). 



Algorithm 1 A procedure for solving Problem |2] based on the BSS method. 
procedure SparsifySumOfMatricesByBSS(-Bi, . . . , B^n, e) 

input: Matrices Bi, . . . , Bm G S" such that J2i ^i = I, and a parameter e G (0, 1). 
output: A vector y with 0{n/e'^) nonzero entries such that / ■< ^^ yiBi ^ (1 + 0{e))I. 
Initially ^(0) := and y(0) := 0. Set parameters uq,(.q, Sl,Su as in ^ and T := An/e^. 
Define the potential functions $"(yl) := Tr(n/ - yl)^^ and ^i{A) := Ti{A - UY^ . 
Fort = l,...,r 

Set ut := ut-i + 5u and £t ■= ^t-i + h- 

Find a matrix Bj and a value a > such that A{t — 1) + aBj G [it, ut], and 

$"*(A(t - 1) + aBj) < $"*-i(^(t - 1)) and ^e,{A{t - 1) + aBj) < ^et_AMt - !))• 

Set A(t) := A{t - 1) + aBj and y{t) := y{t - 1) + ae^. 
Return y{T)/\^UMT)). 

Proof. Clearly M >~ 0. Let F := X^/"^. By the Sherman-Morrison- Woodbury formula ifTSl . 

$"'(^ + aX) = Tr(M - aFF'^)"^ = Tr {AT^ + aM~V(/ - a^'^M" V)"V^M"^) 
= $"'(yl) + Tr (aM~V(/ - aV^ M~^V)~^V'^ Ar^) . 

Since M^i ^ 0, X / and $"(A) > $"'(yl), our hypotheses imply 1 /a > (M-\X) = Tt{V'^ A'r^V) > 
Amax(l^^M-iy) > 0, SO /? := A„,in(/ " aV^M-^V) = 1 - aX^,^{V^ M'^V) > and by, e.g., HE 
Corollary 7.7.4], 

-< /?/ ^ / - aV^M-^V =^ -< (I - aF'^M-V)-^ ^ Z?"^/. 

Thus, 

$"'(A + aX) < $"'(A) + a/3-1 Tr(y^M-V) 

= $"(^) - ($"(^) - $"'(A)) +a/3-i(M-^X) 

To prove that '^''' {A + aX) < $"(A), it suffices to show that af3-'^{M~'^,X) < $"(yl) - $"'(^). 
This is equivalent to 

(M-2,X) 



1/a - An,ax(^^M-ly) 



< $"(A)-$"(A), 



which follows from 1/a > Ua{X) since X^^^{V'^ M^^V) < Tr(y'^M-iy) = {M-\X). 

It remains to show that Amax(^ + ctX) < u' . Suppose not. Choose e G (0, 5u) such that 1/e > <I>"(74). 
By continuity, for some a' G (0, a) we have Amax(^ + a' X) = u' — e. Since 1/a' > 1/a > Ua{X), we 
get $"' (A + a'X) > 1/e > $"(A) > $"' {A + a'X), a contradiction. D 

Lemma 20 (Analog of Lemma 3.4 in pi]). L^-? A £ S"^ and X e S", w/f/z n > 2. Lef £ G M anJ 6l > 0. 
Suppose Amm(-4) > i and ^i>{A) < 1/5l. Let £':=£ + 6l and N := A - i'l. If 

then Amin(^ + aX) > f and ^e'{A + aX) < ^i{A). Moreover, TV ^ 0. 
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Proof. Note that Amin(^) > ^ and <^i{A) < 1/5l imply that N y 0, and therefore Amin(-4 + aX) > £'. 
Let V := X^'"^. By the Sherman-Morrison- Woodbury formula, 

^e>{A + aX) = Tr{N + aVV'^)-'^ = Tr (iV^^ - aN-^V{I + aF'^iVV)- V^iV"^) 
= $f/(^) - Tr (QiV"V(/ + ay^iV"V)"V^iV"^). 

For /3 := Amax(-f + aV^N-^V), we have 

^ / + aV^N-^V ^ /3/ ^ ^ /3-^/ ^ (/ + a1/^iV-V)-^ 
Thus, 

^i'{A + aX) < ^e{A) - a/3~^ Tr(y'^A^"V) 

= ^i{A) + ($^,(A) - $^.4)) - a/3~\N-^X) 

We will be done if we show that a(3^^{N^^,X) > ^e'{A) - ^e{A). This is equivalent to 

— ^^ ;^J — -- > <^e{A) - ^i(A) 

1/a + Xn...iV^N-^V) - '^ > '^ > 

which follows from < 1/a < La{X), since <^ii{A) > <^i{A), N >- 0, and Amax(^^A^"V) < 
Tt{V'^N~W) = {N-'^,X). D 

The next lemma can be proved by a syntactic modification of the proof of Lemma 3.5 in HI. 

Lemma 21 (Analog of Lemma 3.5 in lH). Let A G S" with n > 2, and let u, ^ G M and eu,6u,£L,SL > 
such that Amax(^) < u, Amin(^) > (, ^^{A) < £{/, and ^e{A) < el- Let Bi, . . . ,Bm G S" such that 

T.^B^=LIf 

^<^ + eu<^-£L (3) 

then there exists j G [m] and a > for which La{Bj) > 1/a > Ua{Bj). 

Proof. As in |4, Lemma 3.5], it suffices to show that Yli^A{Bi) > X^j^A(^i)- Let u' := u + 5u, 
M := u'l — A, i' := i + Sl, and N := A — i'l. It follows from the bilinearity of (•, •) and the assumption 
J2^B, = I that 

Tr M^^ 

Y" UA(Bi) = — — -— + Tr M-^ (4a) 

Z^ ^^ '' $«(A)-$«'(^) ^ 

Tr N~'^ 

It is shown in [4. Lemma 3.5] that (l4al) is at most (l4bl) . completing the proof. D 

Now we set the parameters of Lemma[2T] similarly as in Q: 

6l ■■= 1 ^L := n *^o := ou := 7; eu ■= 7-^ no := — . (5) 

2 El 2,- £ 2du £u 

So ^ holds with equahty. If A is the matrix obtained after T = 4n/e^ iterations, then 

Amax(^) ^ no + T6u _ (1 + e\\ 1 + e 



XminiA) - Io + T6l \2-eJ "l-e 

so ^' := yl/Amin(^) satisfies/ :< A' ^ (l + e)//(l — e) and yl' is a positive linear combination of 0(n/e^) 
of the matrices Bi. 

It is easy to check that the previous lemmas also hold if we replace the set S" of symmetric matrices of 
size n X n by the set H" of Hermitian matrices of size n x n. 



4.1 Running Time 

At each iteration, we must compute Ua{Bj) and Lj\{Bj) for each j G [m]. The functions Ua{X) and 
La{X) are the inner products of X with certain matrices that can be obtained from A in time 0{tr'). Thus, 
each iteration runs in time 0{'tr'+mn?) = 0{'mv?), and the total running time after T = An/e^ iterations is 
0{mn^/e^). We remark that the reduction to the case B = I can be made in time 0{m'n?). This concludes 
the proof of Theorem [3] 

If the matrices Bi have 0(1) nonzero entries, as in the graph sparsification problem, the algorithm can 
be made to run in time 0(r& je^ + mn/e^). We briefly sketch the details. To reduce the problem to the case 
that B = I, we first compute {B~^)^'^, where B^ is the Moore-Penrose pseudoinverse of B. Define the 
function f{X):= {B+)'^/'^X{B+y/^ on S". 

The reduction now calls for replacing each input matrix Bi by f{Bi) and the matrix B by f{B). But 
we shall not do this. Instead, we do some preprocessing at each iteration as follows. The function Ua{X) 
(as well as La{X)) is the inner product of X with a certain matrix V. Hence, UA{f{Bj)) = {V, f{Bj)) = 
{f{V),Bj) for every j, since / is self-adjoint. Thus, to compute UA{f{Bj)) for each j, we first compute 
the matrix f{V) in time 0{n^), and now the inner product UA(f{Bj)) = {f{V),Bj) can be computed in 
constant time for each j, since Bj has 0(1) nonzero entries. Thus, each iteration runs in time 0{n^ + m) 
and the total running time is 0{n^ /e^ + mn/e^). 

5 Solving Problem |2] by MMWUM 

Observe that the set of all vectors y that are feasible for (O is the feasible region of a semidefinite program 
(SDP). So solving Problem |2] amounts to finding a sparse solution to this SDP Here "sparse" means that 
there are few non-zero entries in the solution y; this differs from other notions of "low-complexity" SDP 
solutions, such as the low-rank solutions studied by So, Ye and Zhang ll36l . 

It has long been known known that the multiplicative weight update method can be used to construct 
sparse solutions for some linear programs. A prominent example is the construction of sparse, low-regret 
solutions to zero-sum games [9, 43, 44 J. (Another example is the work of Charikar et al. Q on approxi- 
mating metrics by few tree metrics.) Building on that idea, one might imagine that Arora and Kale's matrix 
multiplicative update method (MMWUM) [3] can construct sparse solutions to (O. In this section, we show 
that this is indeed possible: we obtain a solution y to Problem [2] with 0{n log n/e^) nonzero entries. 

5.1 Overview of MMWUM 

The MMWUM is an algorithm that helps us approximately solve an SDP feasibility problem. The gist of (a 
slight modification of) the method is contained in the following result (its proof can be found in Appendix|B]l: 

Theorem 22. Let T,K,ni, . . . , rix be positive integers. Let CkjAi^i;., . . . , A^^k ^ S"'' far k G [K]. For 
each k G [K], let r]k > and < /3fc < 1/2. Given Xi, . . . , Xk G S", consider the system 

m 

J2 y^{Ak,Xk) > {Ck,Xk) - Vk Tr Xfc, Vfe G [K], and y G M^. (6) 

1=1 

For each k G [K], Jet {Vk,Mk} be a partition of [T], Jet < £k < Ph and Jet W^*^ G S" and £^*^ G Rfar 
t G [r + 1]. Let y^'^' G W^ for t G [T]. Suppose thefaJJowing properties hoJd: 

Wf^ = exp --^ Y. E yJ'^^^.'^ -C>^ + i^^ ' Vt G {0, . . . , T}, Vfc G [K], 



y = y^^^i is a solution for Q with Xk = W^*\ V/c G [K], \ft G [T], 
y)M,,fc-CfcGr -..^A/- yt £ [T], k £ [K], 

£^=4, VtGPfc, VA:G [iC], a?iJ 4*^ = -4, Vi G A4, V/c G [isT]. 
Define y := y X]t=i 2/ • T'/j^'^. 

X] yiAi,k -Ckt- k4 + ^^"^ t,y ^""^ + (1 + /?fc)r?fcj J, VA;G[K]. (7) 

i=i -'^'^ 

Take K = 2, set Ci := / and C2 := — /, and put Aj 1 := Sj and Aj 2 := —Bi for each i G [m]. Then 
Theorem[22]shows that finding a solution for Q reduces to constructing an oracle that solves linear systems 
of the form Q with a few extra technical properties involving the parameters ij. and p^, and adjusting the 
other parameters so that the error term on the right-hand side of (|7]l is < e. 

To obtain a feasible solution for ^ that is also sparse, the idea is to design an implementation of the 
oracle that returns a vector y(*) with only one nonzero entry at each iteration t of MMWUM, and to adjust the 
parameters so that, after T = 0(n log n/e^) iterations, the smallest and largest eigenvalues of X^^^ ViBi 
are e-close to 1. Since y is the average of the y^^^'s, the resulting y will have at most T nonzero entries. 

We set the remaining parameters as follows: 

a o a ^ 2(/) + ^)lnn £ 

/3:=^i:=/32:=^, T := , , := ,1 := ,2 := -, 

£:=4:=^2:=1, p := pi := p2 := ^^^n, Vi := M2 := [T], A^i := P2 := 0- 

V 

Then the error term on the right-hand side of (|7]l is 

''^+^Si— + <' + «'' = 4 + 2 + + 4)1 = T+32 S - W 

Thus, (ID follows from ^ and ([8]). Moreover, T = 0(n log n/e^), as desired. 

5.2 The Oracle 

It remains to implement the oracle. Consider an iteration t, and let Xi := W^ and X2 := W2 be given. 
We must find y := y^*) G M^ with at most one nonzero entry such that 

•mm m 

Y,yi{Xi^Bi)>{l-r,)TiXi, ^yi(X2,Si)<(l + r/)TYX2, and J]] y^S^ G [0,p]. 

1=1 i=l i=l 

Since y should have only one nonzero entry, it suffices to find j G [m] and a G M+ such that 

a{Xi,Bj) >(l-r/)TrXi, 

a{X2,B,)<{l + r,)TTX2, (9) 

aTrBj < p. 

Here we are using the fact that Xma.x{Bj) < Tr Bj since Bj >z 0. We will show that such j and a exist. Due 
to the definition of Wi and W2, the oracle can assume that Xi is a scalar multiple of X2^, although we will 
not make use of that fact. 
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Proposition 23. Let Bi,. . . ,Bm ^^+ such that YT=i ^i = I- Let t] > and Xi, X2 G S"+. Then, for 

p := (1 + r])n/r], there exist j G [m] and a > such that Q holds. 

Proof. By possibly dropping some i?j's, we may assume that Bi ^ for every i G [tti]. Define pj := 
{Xi , Bi) / Tt Xi > for every i G [m] . Consider the probability space on [m] where j is sampled from [m] 
with probability pj. The fact that J2f=iPj = 1 follows from Y^'iLi^i = I- Then Bj[pJ^TiBj] = 
E™ 1 Tr Sj = Tr / = n. By Markov's inequahty, 



P (p-'TrB, < ii±^n) = l-pLj^TrB, > ii±^n) 



^ >1-T^ = T^- (10) 

1 + 7/ 1 + 7? 

Nextnotethat Ej[pTi(X2,Sj)] = EiILi(^2,A) = (^2,^) = TrXs- Together with Markov's in- 
equality, this yields 

p(^pj\X2,B,)<{l + r])TiX2) =l-p(^pj'{X2,Bj)>il + r])TrX2) > 1 " j^- dD 
It follows from (flOl ) and (fTTT) that there exists j G [m,] satisfying 

p-;^{X2,Bj)<{l + 7]) Tr X2 , and pj^ Tr Sj < -^-^n = p. 

J ■' Tj 

Set Q '■= Pj^ and note that 

a{Xi,B,) = pj\XuB,) = TrXi > (1 - r/) Tr Xi. 

Hence, j and a satisfy ^. D 

The following proposition, proven in Appendix O shows that the parameters achieved by Proposition |23] 
is essentially optimal. 

Proposition 24. Any oracle for satisfying (|9]l must have p = Q{n/ri), even if the Bi matrices have rank one, 
and even if Xi is a scalar multiple of X2 ■ 



We also point out that a naive application of MMWUM as stated by Kale in [221 does not work. In 
his description of MMWUM, the parameter K is fixed as 1. So we must correspondingly adjust our input 
matrices to be block-diagonal, e.g., C has two blocks: / and — /. However, applying Theorem |22] in this 
manner would lead to a sparsifier with Q.{n?) edges. The reason is that the pai^ameter p needs to be i7(n), 
and we must choose i = p since the spectrum of YllLi Vi-^i — C i& symmetric around zero for any y. Thus, 
to get the error term on the right-hand side of ([7]) to be < e, we would need to take T = Q.{v?). 

6 Solving Problem I2 by a Width-Free MMWUM 

The algorithm of Section |5] solves Problem |2] with only 0{n\ogn/e^) nonzero entries, which is slightly 
worse than the 0{n\ogn/e^) nonzero entries achieved by the Ahlswede-Winter method discussed in Sec- 
tion|3] The main reason for this discrepancy is that MMWUM requires us to bound the "width" of the oracle 
using the parameter p; formally, the oracle must the inequality a Tr Bj < p in Q. In order to satisfy this 
width constraint, the oracle loses an extra factor of 0{l/e), and this is necessary as shown in Proposition l24l 
In this section, we slightly refine MMWUM to avoid its dependence on the width. This allows us to 
simplify our oracle and avoid losing the extra factor of 0{l/e). We obtain a solution to Problem |2] with only 
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only 0{n log n/e^) nonzero entries, matching the sparsity of the solutions obtained by the Ahlswede-Winter 
inequality. 

The following theorem is our width-free variant of MMWUM. We remark that the method described 
in this theorem is geared towards solving Problem |2] and is not necessarily useful for all applications of 
MMWUM. 

Theorem 25. Let T be a positive integer. Let Bi, . . . , Bm G S" be nonzero. Let 7, 77, 6l,5u > 0. For any 
given Xl , Xjj G S", consider the system 



exp(7aTri?j) — 1 



'^^- T^, ^^^'^^' 

^ l-exp(-7aTr^,) (12) 

^^- TVS^ ^^^'^^■^' 

a G M+, j £ [m]. 

For each t € {0, . . . ,r + 1}, let A{t),WLit),Wuit) G S", let a{t) G R+, andletj{t) G [m]. Suppose the 
following properties hold: 

t 
A{t) = ^aiT)Bj^^), VtG{0,...,T}, 

T = l 

VF^(t + 1) = exp{jA{t)) and Wiit + 1) = exp(-7A(t)), Vt G {0, . . . , T}, 



Then 



A{T) ^ 



\og{l- 5l) ^ logn log(l + 5;7) logn' 



(13) 



7 T7 7 T7 

Proof. We will use Golden-Thompson inequality: 

Tr(exp(A + B)) < Tr(exp(A) exp(S)), V^, B G S". (14) 

We will also make use of the following facts. First, 

exp(cx) < 1 + ^^^^^ h~ ^ Vc G M, 6 > 0, X G [0,6]. 
For X G §1, we have Amax(^) < TrX, so X G [0, Tr X], and 

exp(cX)^/+ --P(-"^-/)-^ X. (15) 

ir A 

For each t G [T + 1], define ^^(t) := Tr Wiit) and $[/(t) = Tr Wu{t). For each t G [T], 

$[/(t + 1) = Tr (^exp(7A(t))) = Tr (^exp{jA{t - 1) + 7a5j] 

CD / 

< Tr exp(7^(t - 1)) exp(7aSj) 



^ r^ f , w Nx/exp(7QTrS,)-l 
< Tr(^exp(7A(t-l))( ^\^^/^ ^,^1 

- ''''^^^"'^^^■^ — Tr(exp(7A(t - l))5j) + Tr(exp(7^(t - 1))) 



lyVu{t),B>)^^y{t) 



exp(7a Tr i3j) — 1 
< (l + 5c/)^c/(t), 
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where we abbreviated j := j{t) and a := a{t). 

Since A{0) = 0, we have that <&(7(1) = Tr / = n. Using ([T6]l, after T iterations, 

^c/(T + 1) < (1 + 5c/f n. 

Thus, 

n 

exp(7A„,ax(^(T))) < ^exp(7Ai) = Ti Wu{T + 1) = $t/(T +!)<(! + Sufn, 

i=l 

where Ai, . . . , A„ are the eigenvalues of A{T). And so 7Amax(^(r)) < T'log(l + 6u) + logn, which 
implies the upper bound in (fT3] ). The proof of the lower bound is analogous. D 



Next we establish conditions under which we can construct an oracle for solving the system (fT2)) . The 
proof consists of algebraic manipulations and an averaging argument analogous to the proof of Lemma 3.5 
in 141. 

Theorem 26. Let Bi, . . . , Bm G S" be nonzero such that ^^iBi = I. Let 5u,6l > be such that 

1 1 

— -n>— . (17) 

ol Ou 

Then, for any Xl^Xjj G S" _(_ with trace one, the system (1121) has a solution. 
Proof. The first inequality in (IT2l) is equivalent to 

TiBj {Xu,Bj^ 



exp(7a Tr Bj) — 1 6u 

Using the identity ^_\,,j. = 1 + -^^ , the second inequality in (fT2l) is equivalent to 



> ^ "' " . (18) 



J < ^ ^; ^' -TrB.. (19) 



exp(7aTr Sj) — 1 6l 

We will choose j G [m] so that 

i^L,B,) _^^^ {Xu,B,) ^20) 

and set a so that ([181) holds with equality. Then both ([18]) and ([TO]) will hold. Note that a > since 

g7aTri3, = 1 + ^^^ Tr Bj/{Xu, Bj) > 1 and 7 Tr Bj > 0. 

To see that there exists j G [m] satisfying ( [20] ). note that, by ( [TTl ) and X^i^i A = ^> 



{^L^Bi 



5l 



TvXl 1 ^ 1 TrXt; ^{Xu.Bi 



Tr Sj = — n = n>-^ = — = > 



<5l 5l <^C/ (^t/ ^ 5c/ 



This concludes the proof. D 

Finally, let us show how to set the parameters to get a sparsifier. Given e G (0, 1), set 

77^77 „ nlogn 

,^:=e/2, (^c/:=-^, ^l := ,, ' , T := ^^. (21) 

n (1 + ?7Jn r/^ 
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By our choice of 6l and 6u, we have 1/6l — n = (1 + r])n/r] — n = n/rj = l/5u, so (fTTl ) holds with 
equality. After we run the modified version of MMWUM given by Theorem [25l we obtain a matrix A{T). 
Set A := A{T)/T. By Theorem[ 



A_(^) < i^^Mi±M + i^ < (,, + !^)/, - 1 + ^ 



7 Tj V n / 717/7/ 

We will use that — log(l — a;) > x for x < 1. Thus, 



7 T7 V n / 717/77 777/77 

So if we choose 7 = 77/71 then (1 — e)I ^ A ^ [1 + e)I and A is of the form J2i ViBi with y > and has 
at most T = 0{n\ogn/e'^) nonzero entries. 

Remark. The choice of 7 is actually irrelevant here. We could choose 7 > arbitrarily, then define 
A = A{T) ■ {n^/r]T) and the desired conclusion would hold. 

7 Solving Problem |2] by Pessimistic Estimators 

An anonymous reviewer for a preliminary draft of this paper raised the possibility of designing another 
deterministic solution to Problem |2] The proposal was to use the pessimistic estimators of Wigderson and 
Xiao [42 1 to derandomize the random sampling approach of Section |3] In this section we show that this pro- 
posal indeed works. We remark that pessimistic estimators were also used by Hofmeister and Lefmann 1 17] 
to derandomize the proof of Theorem [T5l 

It is known that there is a close relationship between pessimistic estimators and multiplicative weight 
update methods. (See, for example, the work of Young ll44l .) However, the two methods are not identical, 
and in particular the algorithm presented in this section is not identical to either of our algorithms based on 
MMWUM. To illustrate one difference, notice that the algorithm in Section|3]has the property that its output 
vector y has every component yi equal to an integer multiple of 71/ (T • Tr i?j). The algorithm of this section 
also has that property as it is a derandomization of the algorithm in Section |3] However, the algorithms in 
Sections|4l|5]and|6]do not have that property. 

Definition 27 (Definition 3.1 in ll42]| '). Let X = {Xi, . . . , Xt) be random variables distributed over [m\. 
Let S be an event with P(X € S) > 0. We say that <J)q, . . . , <J)t, (pi '■ [i^Y ~^ [0) 1]> '^^^ pessimistic 
estimators for S if the following hold. 

L For any i and any fixed xi, . . . ,Xi G [tti], we have that 

'PXi+u...,XT (^{xi,...,Xi,Xi+i,...,XT) ^ Sj < (t>i{xi,...,Xi). 

2. For any i and any fixed xi, ...,Xi G [ri].' 

Exi+i(0i+i(a;i,. . . ,Xi,Xi+i)) < (f)i{xi,... ,Xi). 

Note that the function (pQ depends on no variables and is therefore just a scalar in [0, 1]. A nice prop- 
erty of this definition is that it allows compositions very easily. That is, if we have pessimistic estimators 

00, . . . , (/>T and 7/^0) • • • 1 ^T for events S and S', resp., then (j)Q + ipQ, . . . ,(J)t + ipr are pessimistic estimators 
for the event S n S' (see Lemma 3.3 in 1421 ). 
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The key point of this method is that, if there are pessimistic estimators 0o, • • • , (/t. such that 0o < 1 and 
each (pi can be computed efficiently, then one can find [xi, . . . , xt) S S efficiently. 

Let Xi, . . . , Xt be be i.i.d. random variables with same distribution as the random variable X as defined 
in Section [3] Wigderson and Xiao 142)1 considered the event 

1 ^ 
S> = {X:-Y,Xihil-e)f,I} 

i=l 

and obtaineqjthe following pessimistic estimators: 

(j)o = ne*^(^-")'^ ||Ex (exp(-tX))|f < nexp(-reV/(21n2)); 

j 
Mxi, ■■■,Xi):= e*^(i-^)'^ TV (exp(- J] te^)) • \\Ex (exp(-tX)) f "' , 

where t = log ( (-^Z -uil^) ) • Similarly, for the event S'< = {X : j: ^i=i Xi ^ {I + e)fil}, one can find 
the following pessimistic estimators 

^0 = ne~*'^(^+")^ ||Ex (exp(t'X))|f < nexp(-reV/(21n2)); 

i 

Mxi, ...,x,):= e-*'^(i+-)'^Tr (exp(^t'x,)) • ||Ex (exp(t'X)) 



= log ( ''Itli+s)'^^ ) ■ If we choose T > (21n2)ln(2n)/(eV) = (21n2)nln(2n)/e2, then (/)o + 
Each 0i, V'i can be computed efficiently and so one can find in polynomial time (xi, . . . , xt) € 



where t' 

V'o < 1- Each 0i,V'i can 

s>ns<. 



8 Comparing BSS and MMWUM 

In this section we show a striking similarity between the algorithms presented in Sections |4]and[6l The proof 
of Theorem [25] defines two potential functions for each iteration t. 

<^u{t) := TvWuit) = Trexp{jA{t)) 
$L(t) := TrVFi(t) = Trexp(-7A(t)) 

The proof shows that, for the algorithm of Section|6l the potentials must change as follows: 

$t/(t + l) < {l + 6u)'Puit) VtG{0,...,r-l} 
<^L{t + 1) < (1 - 6L)<^Lit) Vt G {0, . . . , T - 1}. 



(22) 



Instead of requiring these potentials to grow and shrink in this way, we could instead parameterize the 
potential functions by the iteration number t and then simply require that the potential do not grow from 
iteration to iteration. To formalize this alternative approach, let us define the new potential functions 

^"(yl) := Trexp(-u/ + 7^), 
^^eiA) := Trexp(^/-7A) 

and define the parameters Ajj = ln(l + 6u) and A^ = In ((1 — 6l)^^)- 



There was an factor of n in the ^i that can be removed. 
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Algorithm 2 A procedure for solving Problem |2] based on the Width-Free MMWUM method. 
procedure SparsifySumOfMatricesByMMWUM(i?i, . . . , B^, e) 
input: Matrices Bi, . . . , Bm G S" such that J2i ^i = I, and a parameter e G (0, 1). 
output: A vector y with 0(n log n/e^) nonzero entries such that / ^ J2i ViBi ^ (1 + 0{e))I. 
Initially ^(0) := 0, and 2/(0) := 0. Set parameters 

no:=0, 4:=0, Af; := ln(l + 5c/), A^ := In ((1 - (^l)-I), 

where 5u, Sl and T are as defined in (|2T]) . 

Define the potential functions ^"^{A) := Tr exp(-Ml + ^A) and ^e{A) := Tr exp(^/ - 7^4). 

Fort = l,...,r 

Set ut := ut-^i + Au and £t ■= ^t-i + Az,. 

Find a matrix Bj and a value a > such that 

^"'(A(t-l) + aSj) < ^"*-i(A(t-l)) and ^£,(^(t - 1) + aS^) < ^^,„,(^(t - 1)). 

Set A(t) := A{t - 1) + aS^- and y{t) := y{t - 1) + ae^. 
Return y{T)/X^in{A{T)). 



Proposition 28. T/ie inequalities in (1221) governing the algorithm 's change in potentials are equivalent to 
inequalities in ( I23I ). 

^(*+^)^^(A(t)+a5j) < ^*^^(^(t)) 

^(t+i)A^ {A{t) + aB,) < %A, iA{t)) ^^^^ 

Proof. Obviously (l22l ) is equivalent to 

(1 + <5t/)-(*+i) • $t/(t + !)<(! + '^c/)-* • ^t/(i) Vt G {0, . . . , r - 1}, 
(1 - (^l)-(*+i) • <l>L(t + !)<(!- (^l)"* • ^L(t) Vt G {0, . . . , T - 1}. 

By the definition of ^u and $l, and by properties of the exponential function, these inequalities are equiv- 
alent to 

Trexp(-(t + l)At// + 7^(i + l)) < Trexp(-tAc// + 7^(t)), 
Trexp((f + l)AL/-7yl(t + l)) < Trexp(iAL/ - 7^(t)). 

Writing A{t + 1) = A{t) + aBj, these inequahties in dJU) are equivalent to (l23]l. D 

Algorithm [2] gives pseudocode for the algorithm of Section |6l using the functions ^" and ^^ to control 
the change in potentials. 

The main point of this section is to observe that Algorithms [U and [2] are identical with the exception 
of different parameters and different potential functions. We believe that this similarity between these two 
algorithms is intriguing, especially since the ESS algorithm has been called "highly original" by Naor [28|. 
In retrospect, it would have been perhaps more natural to develop the BSS algorithm by the following logical 
progression of ideas: first observe that MMWUM is useful for giving sparse solutions to SDPs, then design 
Algorithm |2l then later realize that a clever refinement of it leads to Algorithm [U and its improved analy- 
sis. It is remarkable that Batson, Spielman and Srivastava developed their algorithm from first principles, 
apparently without knowing this connection to established algorithmic techniques. 
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With the advantage of hindsight (i.e., the knowledge that the BSS algorithm exists), we now explain how 
one might be tempted to refine Algorithm |2l It is quite tempting to modify the potential functions to more 
strongly penalize eigenvalues which deviate from the desired range. The natural approach to do this would 
be to increase the derivatives of the potential function by increasing the parameter 7. However, as remarked 
at the end of Section [6l the algorithm is actually unaffected by varying 7! Thus, to improve Algorithm |2j 
one must seek a more substantially different potential function. 

Focusing on the upper potential, we consider the question: is there a function / : M — ;• M with steeper 
derivatives than exp(u — x) and such that, for any matrices A and B, Tr f{A + B) can be easily related 
to Tr /(j4)? The natural candidates to try are f{x) = — log{u — x) and f{x) = (n — x)~^ since, in both 
cases, Tr f{A + B) can be related to Tr/(^) by the Sherman-Morrison- Woodbury formula. We do not 
know whether the choice /(x) = — log(n — x) can be made to work. However, choosing f{x) = (u — x)^^, 
one arrives at Algorithm [T] our generalization of the BSS algorithm. Of course, even after arriving at this 
algorithm, one must also analyze it, and this requires the delicate calculations that were accomplished by 
Batson, Spielman and Srivastava. 
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A Proofs of the Applications 

Corollary U Let G = (V, E) be a graph, let w. -E — )• M+ be a weight function, and let ci, . . . ,Ck'- E ^ 
M_i_ be cost functions, with k = 0{n). Let Cg{w) denote the Laplacian matrix for graph G with weight 
function w. For any real £ G (0, 1), there is a deterministic polynomial-time algorithm to find a subgraph 
HofG and a weight function wh : E{H) — t- M+ such that 

Cg{w) < Ch{wh) ^ {1 + £)Cg{w), 

y^ WeCi^e < ^ WH,eCi,e < (1 + s) X^ WeCi^e for all i 

e€E e&E{H) e£E 

and\E{H)\ = 0{n/e^). 

Proof. For every edge e = ij ^ E, let Bf. be the direct sum Wij [(cj — ej){ei — ejY' © ci^g ® • • • ® c^^e] ■ 
Let B := Cg{w) ® tiF ci © • • • © w^Ck- The result follows immediately by applying Theorem[3]to these 
matrices. D 

Corollary HI Let G = {V, E) be a graph and let w: E ^ M+ be a weight function. Let Ei, . . . ,Ek be 
a partition of the edges, i.e., each edge is colored with one of k colors. For any real e G (0, 1), there is a 
deterministic polynomial-time algorithm to find a subgraph HofG and a weight function wh '■ E{H) — t- 
M+ such that 

Cg{w) ^ Ch{wh) < {l + e)CG{w), 

{l-£)^^We < ^ WH,e < (l + e) '^ We for alH, 

ee-Bj e&E{H)nEi eeEi 

and\E{H)\ = 0{{n + k)/e^). 

Proof. For each i, let Cj : i? — )• M be the characteristic vector of Ei. Now apply Corollary ID D 

Corollary |6] (Spectral sparsifiers for hypergraphs). For any real e G (0,1), there is a deterministic 
polynomial-time algorithm to find a sub-hypergraph QofH and a weight function wg : £{G) — )• M+ such 
that 

Cn{w) ^ Cg{wg) < {l + e)Cu{w), 

and\£{g)\ =0(n/e2). 

Proof. The result follows directly by applying Theorem [3]to the matrices weCe- □ 

Corollary |7] (Cut sparsifiers for hypergraphs, second definition). For any real e G (0, 1), there is a deter- 
ministic polynomial-time algorithm to find a sub-hypergraph Q ofH. and a weight function wg : £{0) — )■ M+ 
such that 

w*{6n{S)) < w*g{5g{S)) < {l + e)w*{5n{S)) for every S^V, 

and\£{g)\ =0(n/e2). 

Proof. Note that w*{6-}i{S)) is obtained by evaluating the quadratic form x^ C-}i{w)x, where x is the char- 
acteristic vector of S. Thus the sparsifier produced by Corollary [6] satisfies the desired inequalities. D 
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Corollary |8] (Cut sparsifiers for hypergraphs, first definition). Assume that T-L is an r -uniform hypergraph. 
For any real e £ (0, 1), there is a deterministic polynomial-time algorithm to find a sub-hypergraph Q ofH. 
and a weight function wg : £{G) — s- M+ such that 

^-^H^uiS)) < Wg{5g{S)) < ^l±^yji6n{S)) V5 c F, 

and 1^(^)1 = 0{n/e^). In other words, the sparsified hypergraph Q approximates the weight of the cuts in 
the hypergraph % to within a factor 0(r^). 

Proof. For any r-uniform fiypergrapli 7^, it is easy to see tiiat 

(r - l)w{5n{S)) < w*{5n{S)) < [r/2\ \r/2\w{5H{S)) VS C V. (25) 

Thus the sparsifier produced by Corollary |6] satisfies the desired inequalities. D 

Corollary |9] (Cut sparsifiers for 3-uniform hypergraphs). Assume that T-L is a 3-uniform hypergraph. For 
any e G (0, 1), there is a deterministic polynomial-time algorithm to find a sub-hypergraph QofH and a 
weight function wg : S{Q) — s- M4- such that 

w{5h{S)) < wg{6g{S)) < {1 + e)w{6n{S)) V5 c y, 

and\£{g)\ =0(n/e2). 

Proof. Since r = 3, a consequence of (1251 ) is that w*{6-}i{S)) = 2w{6-}i{S)) for every S. Thus the sparsifier 
produced by Corollary [6] satisfies the desired inequalities. D 

Corollary 1101 Let Ai, . . . , Am be symmetric, positive semidefinite matrices of size n x n, and let B be a 
symmetric matrix of size n x n. Let c £ M™ with c > 0. Suppose that the semidefinite program (SDP) 



mm 



I c^z : ^ ZiAi hB, z gR"", z>o\ 



has a feasible solution z*. Then, for any real e G (0, 1), it has a feasible solution z with at most 0{n/e^) 
nonzero entries and c^z < (1 + £)c^ z*. 



Proof Let B[ :-- 



'z*Ai 
az* 



D 

c^z* 



, where D := ^ • z*Ai >z B. 



for every i G [m] and B' : 

Then B'- h and B' = Y^i^'i- By applying Theorem d we obtain y G M"" with y > and 0{n/e^) 
nonzero entries such that ^^ yiZ*Ai ^ D ^ B and ^^ ciyiZ* < (1 + e)c'^z*. Thus, we can take Zi = yiZ* 
for every i G [m]. D 

Corollary 111! Let G = (V, E) be a graph. For any real e G (0, 1), there is a deterministic polynomial-time 
algorithm to find a subgraph HofG such that 

{l-e)t'{G)<t'{H)<t'{G) 

and\E{H)\ =0{n/e'^). 
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Proof. It is straightforward to formulate t'{G) as an SDP (see, e.g., ll26l ") so that its dual has an optimal 
solution and there is no duality gap. The dual can be written as: 

max I ^ ^;e : Diag(2/) ^ Cg{z), ^ j/^ = 1, z > o| (26) 

eS-E v&V 

The proof is now almost identical to the proof of Corollary [TO] Let (z*,y*) be an optimal solution. Using 
Theorem m we obtain z G M^ with z > and 0(n/e^) nonzero entries such that {y*,z) is feasible 
in (l26l ) and has objective value J2eeE(H)^e > (1 — £)t{G), where H = {V,E{H)) and E{H) is the 
support of z. Then z is also feasible for the SDP defined using H instead of G, which shows that t'{H) > 
(l-e)t'(G). D 

Corollary 1121 Let G = (V, E) be a graph. For any real e G (0, 1), there is a deterministic polynomial-time 
algorithm to find a supergraph HofG such that 



l-e + e^'{G) 
and\E{H)\ = Q-0{n/e^). 

Proof. For a graph G = {V,E), define t{G) as the square of the minimum radius of a hypersphere on 
M" such that there is a map from V to the hypersphere such that adjacent vertices are mapped to points at 
distance exactly 1. Lovasz [26.1 noted that t{G) is related to the Lovasz theta number 'd{G) of the comple- 
ment G of G by the formula 2t{G) + 1/-&{G) = 1; see lUl for a proof. By repeating the same proof for 
t'{G), one finds that 2t'{G) + 1/t9'(G) = 1. The result now follows from Corollary [TT] via this formula. D 

Corollary [131 Let G be a graph such that ??'(G) = o{^/n). For any real 7 > 0, there is a supergraph H of 
G such that 

^^ < ^'(H) < ^'(G) 

1 + 7 - ^ ^ - ^ ^ 

and \E{H)\ = Q - 0{n^G)yj^). 

Proof Apply Corollary [HI with e := 7/i?'(G). D 

Corollary 1141 Let G be a graph such that t?'(G) = Q{y/n). For any real 7 > 1, there is a supergraph H 
of G such that 

and\E{H)\ = {^ - 0{n'^ h^). 

Proof. Apply Corollary [T2] with e := i/\/n. D 

Corollary 1161 Let Bi, . . . , Bm be symmetric, positive semidefinite matrices of size n x n and let A € M"* 
satisfy A > and ^^ Aj = 1. Let B = '^- XiBi. For any e € (0, 1), there exists /x > with ^^ /ij = 1 
such that fi has 0{n/e^) nonzero entries and 



{l-e)B < ^/iiSi < {l + e)B. 
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Proof. Let B[ :-- 



hBi 
A, 



for every i G [m] and B' :- 



B 

1 



, so that B[ >z and B' = Y.^ B[. By 



applying Theorem[3l we obtain y G R™ with y > and 0{n/e'^) nonzero entries such that B' < ^^ yiB[ ■< 
(1 + e)B' or, equivalently, B <Y.y yAiBi ^ (1 + e)B and 1 < ^ • yiXi < 1 + e. Let // G M*" be defined 
by Aij := ViK/iYji ViK)- Then ^ > and Xli /^i = 1> and 



{l-e)B< 



B 



■< 



B 



1 + ^ Ei y^-^i 



^ 



^fJ-iBi 



-(.l^^B^(l+e)B. 



Ei yi-^i 



This completes the proof. 



D 



Corollary 1171 Let G = (y, E) be a graph, let w: E ^ M+ be a weight function, and let F be a collection 
of subgraphs of G such that '^p^jr\V{F)\ = 0{n). For any real e G (0,1), there is a deterministic 
polynomial-time algorithm to find a subgraph H of G and a weight function wh '■ E{H) — >• M_|_ such that 

\E{H)\ =0{n/e'^)and 

Cg{w) ^ Ch{wh) ^ {l + e)CG{w), 
Cf{wf) < CHnFiwH\E{HnF)) ^ il + e)CFiwF) forallFGF, 

where wf '■= wIf^f) '■* ff^^ restriction ofw to the coordinates E{F) and HCiF = (V{F), E(F) n E{H)). 

Proof. For each edge e e E, define B^ := We[CG{x'^) © 0FGj="^i^(x^t_E(F))]' where x^ denotes the 
characteristic vector of {e} as a subset of E. Now apply Theorem [3l D 

B The MMWUM 

In this section we provide some proofs about the MMWUM. These proofs are due to Kale |[22l . Our set up 
and conclusions are slightly different and we modified the proofs accordingly. We reproduce the proofs here 
for the sake of completeness. 

Theorem |22] can be viewed as a block-friendly version of MMWUM. First we show the version with 
only one block. It is basically the same as |[22l Theorem 13 in Chapter 4]. 



Theorem 29. Let T be a positive integer Let C,Ai,..., Am G §". Let ry > and < /3 < 1/2. For any 
given X G S", consider the system 



Y,yi{AuX)>{C,X)-r^TrX, and y£ 



i";". 



(27) 



i=l 



Let {V,J\f} be a partition of [T], let < £ < p, and let VF^*) G S"" and i^*^ £ Rfort £ [T + 1]. Let 



y{t) g 



^ fort £ [T]. Suppose the following properties hold: 



VF(*+i) 



exp 



TT^E Y.yt^A,-c + 6^h , vtG{o,...,r}, 



y = y(*) is a solution for ^ with X = W^*^ , Vt G [T] , 



Y.^A.-C£ 



-Ap], ift£V, 



Vt G [T], 



,=1 J-P,^], ift£N, 

£(*) = £, Vt G V, and l^^^ = -£, Vt G M. 
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Define y := ^ J2t=i V ■ Then 



±mA-c^-\pi+^J^±^ + [i + p), 



i=l 



Tj3 



I. 



(28) 



The main tool for the proof of Theorem[29]is the following result: 



Theorem 30 (Kale ll22l Corollary 3 in Chapter 3]). Let < /3 < 1/2. Let T be a positive integer. Let 
{V,N] be a partition of [T], and let M^*) G S" /or t G [T] and W'^^ G W for t G [T + 1] with the 
following properties: 

vf(*+i) = exp ( -/3 ^ m(^M vt = o,...,r, 

< mW ^ /, Vt G V, and -I < M^ ^0, Vt G H, 



Let 



Then 



At) .- 



TrW/W 



W 



it) 



yt G [T]. 



(1 - /3) J](mW,pW) + (! + /?) J;(mW,pW) < A^iJj^Mw) + i^. (29) 

Proo/ Set $W := Tr(P^W) for t G [T + 1]. Put /3i := 1 - e'^^ and /32 := e'^ - 1. Then, for any t G [T], 
cd(*+i) = Tr(H^(*+i)) = Tr (exp L/J^M^ H 

< Tr I exp ( -/3^mM j exp ('-/SM^*)' 



Tr(VFWexp( 



= (M/(*),exp(-/3MW)), 

where we have used Golden-Thompson's inequality (fT4b . 
Using the fact that e^ is convex, one can prove that 

O^A^I =^ exp(-/3A) ^ / - l3iA, 
-I ^A^O =^ exp(-/3A) ^ / - f32A. 

Suppose that t G V. Then exp(-/3M(*)) ^ / - /3iM(*), and since VF^*) ^ 0, we get 

$(m) < (VFW,exp(-/3MW)) < (VF^,/ - /JiM^) 
= Tr(VFW)-/3i(VFW,MW) 
= Tr(VFW) -Tr(VFW)/3i(pW,MW) 
= Tr ( VF(*) ) [l - /3i (PW , M W ) 

<$Wexp(-/3i(pW,MW)). 
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Similarly, ifteM, then 

$(*+!) < $Wexp(-/32(p(*\MW)). 

By induction on t, and using <1>(^) = Tr(/) = n, we get 

For every A G S", we have Tr(exp(A)) = Y^^=i ^^^ — ^^^ ^^^ ^'^Y 3 ^ [^\^ where Ai, . . . , A„ are the 
eigenvalues of A. Thus, 

$(^+1) = Tr(w(^+i)) = Tr ("exp L/S^mW]] 



> exp A 



Thus, 



exp 






< nexp 



-/3i ^ (M W , PW ) - /32 ^ (M W , P(*) 



By taking In(-) on both sides, we get 



PXmiJY^M^A <lnn- 



/3i ^ (M W , P W ) + /32 j; (m(*) , P(*) ) 
- teV teM 



so 



^i^(mW,pW)+/32 5](mW,pW) </3A^i„(j;Mw) +lnn, 

nd 

I 5:(mW,pW) + I 5;(mW,pW) < A..ix:^^^*0 + ^- 

Since Etep(^'^^*''' ^^*'') ^ ^ ^"^ EteAr(^^^*''' ^^*-') < 0, to prove (|29l) it suffices to show that 1 - /3 < 
'i//3 and 1 + ;S > (32/(3. It is not hard to prove that 
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1 - e"^ > x{l - x), \/x £ [0, +oo) and e^ - 1 < x(l + x), Vx G [0, i] 
So our choice of /3i and /32 ensures that I — (3 < (3i/(3 and 1 + /? > (32/(3. 
We can now show the proof of Theorem [29] 

Y.T=i yf^^ -C + l^^l] and P(*) := M^(*)/ Tr VF^*) for every t. 



Proof of Theorem^ Let M^ := -^ 
For every t <T, using 



(mW,pW) 



'+p 



■ m 

■ m 



1 



' + /9)TrVFW _ 



j=i 



+ - > -^-^ + 



' + /?- e + p £ + p' 



26 



since y(*) is a solution for ([27]) with X := W^^\ Thus, by (|29l ). 



(l_/j)(^W-^) ^ (i + /3)(^W-^) 



tG-P 



+ /3 



teW 



£ + P 



Multiply through by ^ + p and move l^^'I out of Ainin(-): 
^ (1 - /?)£(*) + 5] (1 + /3)^« - r(l + /3)77 



+ 



Inn 



tev 



taN 



\t=l L ^i=l ^ V ^t=l 



^(t)\ , (/5 + ^)lnn 



+ 



/? 



Thus, 



te-p t&M \t=i L ^i=i 



Next note that Y.teV -^^*^ + T^teN ^^*^ = Ei^P "^ + E 



T 



^teM' 



+ ^ + ^(1 + /3)^- 



-Ti, so 



^t=i 



0<A^iJj; J^yfM, -C 



i=l 



+ /3r^ + i^^±^+r(i + /3),. 



and 



Thus, 



0< Ar 



fT,\(tyf'^>)-<^'\)+^' + 

t=i L \j=i / jy 



(p + ^) Inn 



E^-^^-^ = ^E Ey?^^)-c 



t=i 



i=l 



y 



Pi + 



TI3 
{p + £) Inn 



+ (1 + /3K 



+ (l + /3)r/ 



/. 
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Theorem |22] can be easily proved from Theorem |29] First, we apply Theorem |29] separately for each 
block. In each iteration, y^*) is a solution for (|27] ) for all blocks simultaneously, and so the conclusion 
in (|28] ) holds for all blocks with same y. This new algorithm can be seen as equivalent to running K copies 
of MMWUM, each with different input data, with the caveat that all copies run for the same number of 
iterations and the vector y^*) returned from the oracle is the same for all copies at each iteration t. 



C Optimality of MMWUM Oracle 

Proposition I24i Any oracle for satisfying ^ must have p = Q{n/rj), even if the Bi matrices have rank 
one, and even if Xi is a scalar multiple of X2 ■ 
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Proof. Let k = n/3, let I^ be the identity of size k x k, and let ej G M.^ be the jth standard basis vector. 
Let ( = 3r] and define 

Xi = Diag(l, C', » 4, ^2 = Diag(f , l/C^ 1/C) ® 4, 

where (g) denotes tensor product. For j = 1, . . . ,k, define 

vij = [l/\/2, -l/\/2, 0] O ej, V2,j = [l/\/2, l/\/2, 0] (g) e^- , t;3j = [0, 0, 1] O ej. 

Let Bij = Vijvfj. Note that ^-j Bij = I. 

The oracle cannot choose a matrix B^j with i G {1,2}, since satisfying ^ would lead to a contradiction: 

{X2,B,) ^^^ {Xi,B,) 



Tr(X2)(l + r?) - a - TY(Xi)(l - 7?) 

^ 1^-^ , , ^ ^ (X2,i?.)/TrX2 ^ 1 + ^ , , , , 
^ l + 3?7 = 1 + C < , . , < :; < l + 3ry, 

for sufficiently small r/. 

So the oracle must choose a matrix Bij with i = 3. In this case, 



a - TV(Xi)(l-7?) 



n _ n ^ (l + C^ + OA: _ Tr(Si,,) Tr(Xi) ^ /, 



_ = _ < ^ _^ _^^ = ^^2£i ^_L1 < 



9r/ 3C - C (^i,^M> - 1-r? 

This shows that p = Q{n/ri). D 

D The positive semidefiniteness assumption 

Proposition 31. For every positive integer n, there exist matrices Bi, . . . , Bm G S" with m = Q{n?) such 
that B := ^^ Bi is positive definite and with the following property: for every e G (0, 1) and y G M™ such 
that (1 — e)B :< ^^ i/iBi, all entries ofy are nonzero. 

Proof. Let V := { {i,j) : i,j e [n], i < j}. For {i,j) G V, let Eij := CjcJ + Cjcf. Let J denote the 
matrix of all ones. Then 2/ + ^/^ x^^ E'jj = / + J =: S :^ 0. Let e G (0, 1) and suppose that 
{l—e)B :< ^tl+Y^,- Ngp ZijEij for some t G M and z G M^. By taking the inner product with Eab on both 
sides, we see that < 2(1 — e) < z^fe for every (a, h) G V. Similarly, we find that < 2n(l — e) < 2nt. D 
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